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Model-driven engineering is the automatic production of software artefacts from abstract models 
of structure and functionality. By targeting a specific class of system, it is possible to automate 
aspects of the development process, using model transformations and code generators that encode 
domain knowledge and implementation strategies. Using this approach, questions of correctness for a 
complex, software system may be answered through analysis of abstract models of lower complexity, 
under the assumption that the transformations and generators employed are themselves correct. This 
paper shows how formal techniques can be used to establish the correctness of model transformations 
used in the generation of software components from precise object models. The source language is 
based upon existing, formal techniques; the target language is the widely-used SQL notation for 
database programming. Correctness is established by giving comparable, relational semantics to 
both languages, and checking that the transformations are semantics-preserving. 



1 Introduction 

Our society is increasingly dependent upon the behaviour of complex software systems. Errors in the 
design and implementation of these systems can have significant consequences. In August 2012, a 'fairly 
major bug' in the trading software used by Knight Capital Group lost that firm $461m in 45 minutes ifTBI . 
A software glitch in the anti-lock braking system caused Toyota to recall more than 400,000 vehicles in 
2010 [25]; the total cost to the company of this and other software-related recalls in the same period 
is estimated at $3bn. In October 2008, 103 people were injured, 12 of them seriously, when a Qantas 
airliner [ 3 ] dived repeatedly as the fly-by-wire software responded inappropriately to data from inertial 
reference sensors. A modern car contains the product of over 100 million lines of source code |@), and 
in the aerospace industry, it has been claimed that "the current development process is reaching the limit 
of affordability of building safe aircraft" ifTOl . 

The solution to the problems of increasing software complexity lies in the automatic generation of 
correct, lower-level software from higher-level descriptions: precise models of structure and functional- 
ity. The intention is that the same generation process should apply across a class of systems, or at least 
multiple versions of the same system. Once this process has been correctly implemented, we can be 
sure that the behaviour of the generated system will correspond to the descriptions given in the models. 
These models are strictly more abstract than the generated system, easier to understand and update, and 
more amenable to automatic analysis. This model-driven approach [11] makes it easier to achieve cor- 
rect designs and correct implementations. Despite the obvious appeal of the approach, and that of related 
approaches such as domain-specific languages [ 8 ] and software product lines [18], much of the code 
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that could be generated automatically is still written by hand; even where precise, abstract specifications 
exist, their implementation remains a time-consuming, error-prone, manual process. 

The reason for the delay in uptake is simple: in any particular development project, the cost of pro- 
ducing a new language and matching generator, is likely to exceed that of producing the code by hand. 
As suitable languages and generators become available, this situation is changing, with significant impli- 
cations for the development of complex, critical, software systems. In the past, developers would work 
to check the correctness of code written in a general-purpose programming language, such as C or Ada, 
against natural language descriptions of intended functionality, illuminated with diagrams and perhaps a 
precise, mathematical account of certain properties. In the future, they will check the correctness of more 
abstract models of structure and behaviour, written in a range of different, domain-specific languages; 
and rather than relying upon the correctness of a single, widely-used compiler, they will need to rely 
upon the correctness of many different code generators. The correctness of these generators, usually 
implemented as a sequence of model transformations, is thus a major, future concern. 

In this paper, we present an approach to model-driven development that is based upon formal, mathe- 
matical languages and techniques. The objective is the correct design and implementation of components 
with complex state, perhaps comprising a large number of inter-related data objects. The approach is par- 
ticularly applicable to the iterative design and deployment of systems in which data integrity is a primary 
concern. The modelling language employed has the familiar, structural features of object notations such 
as UML — classes, attributes, and associations — but uses logical predicates to characterise operations. An 
initial stage of transformation replaces these predicates with guarded commands that are guaranteed to 
satisfy the specified constraints: see, for example, [24J. The focus here is upon the subsequent generation 
of executable code, and the means by which we may prove that this generation process is correct. 

The underlying thesis of the approach is that the increasing sophistication of software systems is often 
reflected more in the complexity of data models than in the algorithmic complexity of the operations 
themselves. The intended effect of a given event or action is often entirely straightforward. However, 
the intention may be only part of the story: there may be combinations of inputs and bef ore-states where 
the operation, as described, would leave the system in an inconsistent after-state; there may be other 
attributes to be updated; there may be constraints upon the values of other attributes that need to be taken 
into account. Furthermore, even if the after-state is perfectly consistent, the change in state may have 
made some other operation, or sequence of operations, inapplicable. 

Fortunately, where the intended effect of an operation upon the state of a system is straightforward, 
it should be possible to express this effect as a predicate relating before and after values and generate a 
candidate implementation. Using formal techniques, we may then calculate the domain of applicability 
of this operation, given the representational and integrity constraints of the data model. If this is smaller 
than required, then a further iteration of design is called for; if not, then the generated implementation 
is guaranteed to work as intended. In either case, code may be generated to throw an exception, or 
otherwise block execution, should the operation be called outside its domain. Further, by comparing the 
possible outcomes with the calculated domains of other operations, we can determine whether or not one 
operation can affect the availability of others. 

The application of formal techniques at a modelling level — to predicates, and to candidate imple- 
mentations described as abstract programs — has clear advantages. The formal semantics of a modern 
programming language, considered in the context of a particular hardware or virtual machine platform, 
is rich enough to make retrospective formal analysis impractical. If we are able to establish correctness 
at the modelling level, and rely upon the correctness of our generation process, then we may achieve the 
level of formal assurance envisaged in new standards for certification: in particular, DO-178C [21]. We 
show here how the correctness of the process can be established: in Section|3j we present the underlying 
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semantic framework; in Section [4j the translation of expressions; in Section [5J the implementation of 
operations; in Section [6] the approach to verification. 

2 Preliminaries 

The BOOSTER language @ is an object modelling notation in which model constraints and operations 
are described as first-order predicates upon attributes and input values. Operations may be composed 
using the logical combinators: conjunction, disjunction, implication, and both flavours of quantification. 
They may be composed also using relational composition, as sequential phases of a single operation 
or transaction. The constraints describing operations are translated automatically into programs written 
in an extended version of the Generalised Substitution Language (GSL), introduced as part of the B 
Method H]. There may be circumstances under which a program would violate the model constraints, 
representing business rules, critical requirements, or semantic integrity properties. Accordingly, a guard 
is calculated for each operation, as the weakest precondition for the corresponding, generated program 
to maintain the model constraints. The result is an abstract program whose correctness is guaranteed, in 
a language defined by the following grammar: 

Substitution ::= skip , ((PATH)) : = ((Expression)) 

((Predicate)) — > ((Substitution)) \ ((Substitution)) \\ ((Substitution)) 

((Substitution)) ; ((Substitution)) j ((Substitution)) □ ((Substitution)) 

! ((Variable)) : ((Expression)) • ((Substitution)) 
@ ((Variable)) : ((Expression)) • ((Substitution)) 

Here, the usual notation of assignable variables is replaced with paths, each being a sequence of at- 
tribute names, using the familiar object-oriented 'dot' notation as a separator. Predicate and Expression 
represent, respectively, first-order predicates and relational and arithmetic expressions, skip denotes ter- 
mination, := denotes assignment, and — > denotes a program guard: to be implemented as an assertion, 
a blocking condition, or as (the complement of) an exception. □ denotes alternation, and @ denotes 
selection: the program should be executed for exactly one of the possible values of the bound variable. 
Similarly, || denotes parallel composition, with ! as its generalised form: all of the program instances 
should be performed, in parallel, as a single transaction. ; denotes relational or sequential composition. 
Inputs and outputs to operations need not be explicitly declared; instead, they are indicated using the 
decorations ? and ! at the end of the attribute name. 

These abstract programs are interpreted as operations at a component applications programming 
interface (API), with the data model of the component given by a collection of class and association 
declarations in the usual object-oriented style. The integrity constraints and business rules for the data 
model can be given as predicates in the same notation, or using the object constraint language (OCL) of 
the Unified Modelling Language (UML) iTffl . 

As a simple, running example, consider the following description of (a fragment of) the data model 
for a hotel reservations system 

class Hotel { class Reservation { 

attributes attributes 

reservations : seq(Reservation.host) [*] } host : Hotel .reservations } 

A single hotel may be the host for any number of reservations. It may also be the host of a number 
of rooms and allocations: see the class association graph [9] of Figure [T] The action of creating a new 
reservation may be specified using a simple operation predicate in the context of the Hotel class: 

reserve { # allocations < limit & reservations' = reservations A <r!> & rl.room = m? } 
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Figure 1: Hotel Reservation System (HRS) — Graph of Class Associations 



allocations 



This requires that a new reservation be created and appended to the existing list, modelled as an ordered 
association from Hotel to Room, and that the room involved is given by input m?. The operation should 
not be allowed if the number of reservations in the system has already reached a specified limit. 

If the constructor operation predicate on Reservation mentions a set of dates dates?, then this 
will be added as a further input parameter. We might expect to find also a constraint insisting that any 
two different reservations associated with the same room should have disjoint sets of dates, and perhaps 
constraints upon the number of reservations that can held by a particular traveller for the same date. For 
the purposes of this paper, however, we will focus simply upon the required, consequential actions and 
the description of the operation as an abstract program, 
reserve { 

r! : extent (Reservation) & dates? : set (Date) & m? : extent (Room) 
& card(allocations) < limit 

r!. dates := r!. dates \/ dates? || r!. status := "unconfirmed" 
|| rl.host := this || reservations := ins (reservations, Reservations + 1, r!) 
|| rl.room := m? || m?. reservations := ins (m?. reservations, #m?. reservations + 1, r!)} 

In this abstract program, the two reservations attributes, in the hotel and room objects, are updated 
with a reference to the new reservation, the dates attribute of the new reservation is updated to include 
the supplied dates, and the status attribute is set to "unconfirmed", presumably as a consequence of the 
constructor predicate for the Reservation class. 

3 A Unified Implementation and Semantic Framework 

To illustrate our formal, model-driven approach, we will consider the case in which the target is a re- 
lational database platform. The above program would then be translated into a SQL query, acting on 
a relational equivalent of our original object model. The transformations can be described using the 
Haskell [2] functional programming language: in the diagram of Figure [2] thin-lined, unshaded boxes 
represent to denote Haskell program data types, and thin arrows the executable transformations between 
them. These constitute an implementation framework. The thick-lined, shaded boxes denote the rela- 
tional semantics of corresponding data types, thick lines with circles at one end the process of assigning 
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a formal meaning, and arrows with circles at each end the relationship between formalised concepts. 
These constitute a corresponding semantic framework for establishing correctness. 
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Figure 2: Booster Model to SQL Database: Implementation & Semantic Framework 

Four kinds of models are involved in our transformation pipeline: 1) a BOOSTER model , in ex- 
tended GSL notation, generated from the original predicates; 2) an Object model representing an 
object-oriented relational semantics for that model; 3) an intermediate Table model reflecting our im- 
plementation strategy; 4) a SQL model expressed in terms of tables, queries, and key constraints. A final 
model-to-text transformation will be applied to generate a well-formed SQL database schema. 

We use Haskell to define metamodels of model structures and operations as data types. Our transfor- 
mations are then defined as Haskell functions: from Booster to Object, then to Table, and finally 
to SQL. Our relational semantics is most easily described using the Z notation [26]. Other formal lan- 
guages with a transformational semantics would suffice for the characterisation of model and operation 
constraints, but Z has the distinct advantage that each operation, and each relation, may be described 
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as a single predicate: rather than, for example, a combination of separate pre- and post-conditions; this 
facilitates operation composition, and hence a compositional approach to verification. 



4 Path & Expression Transformation 

The descriptions of operations in the Booster, Object, and Table models are all written in the GSL 
language; the difference between them lies in the representation of attribute and association references. 
Instead of creating three versions of a language type Substitution, one for each of the reference nota- 
tions, we employ a type PATH as a generic solution: see Figure [3] 
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Figure 3: Datatypes of Behavioural Models 

We define 

data PATH = BPath BPATH | OPath OPATH | TPath TPATH 

where BPath, OPath, and TPath are type constructors. A BOOSTER model path (of type BPATH) is 
represented as a sequence (ai,. . . ,a n ) of name references to attributes/properties. We will refer to this 
range 1 . . n of indices for explaining the corresponding Object and Table model paths. 

We consider structures of the types OPATH and TPATH in detail. Paths of type OPATH are used to 
indicate explicitly which properties/classes are accessed, along with its chain of navigation starting from 
the current class. 



data OPATH 
data REF_START 
data TARGET 
data BASE 

type IDEN_PROPERTY = (N_CLASS, N_ATTRIBUTE) 



= BaseOPath REF_START | RecOPath OPATH TARGET 

= ThisRef BASE | SCRef IDEN_PROPERTY EXPRESSION BASE 

= EntityTarget IDEN_PR0PERTY | SCTarget IDEN_PR0PERTY EXPRESSION 

= ClassBase N_CLASS | SetBase N_SET | IntBase | StrBase 



An object path is a left-heavy binary tree, where the left-most child refers to its starting reference and 
all right children represent target classes/properties that are accessed. The starting reference of an ob- 
ject path — which denotes access to, e.g the current object, an element of a sequence-valued property 
through indexing, etc. — provides explicit information about the base type of that reference. All inter- 
mediate and the ending targets of an object path contextualise the properties with their enclosing classes 
(i.e. IDEN_PROPERTY) . 

For each context path {a\, . . . ,a,-), where (1 < i < n— 1), an Object model path (of type OPATH) 
identifies a target class C; if the source BOOSTER path is valid, then attribute a !+ i must have been 
declared in C. 
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Example object path. As an example of how the transformation on paths works in practice, consider the 
Account class (Figure [T] shown on page 103 1. The path this . owner . reglist denotes a list of registered 
hotels and has its OPATH counterpart: 

RecOPath (RecOPath (BaseOPath (ThisRef (ClassBase Account))) 
(EntityTarget (Account, owner))) 
(EntityTarget (Traveller reglist)) 

where RecOPath and BaseOPath are constructors for, respectively, recursive and base Object paths. 
EntityTarget and ClassBase construct type information about the three context paths: (Account) for 
this, (Account, owner) for this, owner, and Traveller, reglist for this, owner .reglist. 

On the other hand, we use a path of type TPATH to indicate, for each navigation to a property in the 
Object model, the corresponding access to a table which stores that property, 
data TPATH = BaseTPath REF_START 

| RecTPath TPATH T_ACCESS 
data T_ACCESS = ClassTAccess IDEN_PROPERTY 
| AssocTAccess IDEN_PROPERTY 
| SetTAccess IDEN_PROPERTY 

| SeqTAccess IDEN_PROPERTY -- retrieve all indexed components 

| SeqTCAccess IDEN_PROPERTY EXPRESSION -- retrieve an indexed component 

A table path is left-heavy (as is an OPATH), where the left-most child refers to its starting reference 

and all right children represent target tables that are accessed. The starting reference of a table path 

provides exactly the same information as its OPATH counterpart (i.e. REF_START). All intermediate and the 

ending targets of a table path denote accesses to a variety of tables, predicated upon our implementation 

strategy. When the target property is sequence-valued, we distinguish between the two cases where one 

of its indexed components is to be accessed (SeqTCAccess) and where all indexed components are to be 

accessed (SeqTAccess). 

For each attribute a,, where (1 < i < n), a Table model path (of type TPATH) recursively records 
which sort of table (e.g. class tables, association tables, or set tables) it is stored, based on the target class 
of its context path. 

Example table path. The above Object path has its TPATH counterpart: 
RecTPath (RecTPath (BaseTPath (ThisRef (ClassBase Account))) 
(AssocTAccess (Account, owner))) 
(AssocTAccess (Traveller reglist)) 

where RecTPath and BaseTPath construct, respectively, recursive and base TABLE paths. Properties 

owner and reglist are accessed in the two corresponding association tables. 

Path transformation. We now specify the above OPATH-to-TPATH transformation in Haskell: 

objToTabPath : : 0BJECT_M0DEL -> PATH -> PATH 

objToTabPath om (OPath opath) = TPath (objToTabPath' om opath) 

where the first line declares a function objToTabPath, and the second line gives its definition: matching 
an input object model as om and an input path as (OPath opath), whereas the RHS constructs a new 
PATH via TPath. The transformation of object paths is given by 
objToTabPath' om (RecOPath op tar) = 
case tar of 

EntityTarget (c, p) | (c, p) 'elem' biAssoc' om c -> RecTPath tp (AssocTAccess (c, p)) 
| (c, p) 'elem' classTables tm c -> RecTPath tp (ClassTAccess (c, p)) 
| (c, p) 'elem' setTables tm -> RecTPath tp (SetTAccess (c, p)) 
| (c, p) 'elem' seqTables tm -> RecTPath tp (SeqTAccess (c, p)) 
SCTarget (c, p) oe -> let te = objToTabExpr om oe in RecTPath tp (SeqTCAccess (c, p) te) 
where tm = objToTab om 

tp = objToTabPath' om op 
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where each condition specified between | and -> denotes a special case of the matched entity target, 
consisting of class c and property p. For example, the condition (c, p) 'elem' classTables tm c 
denotes properties that are stored in the table for class c. 

Each recursive object path is structured as (RecOPath op tar), where op is its prefix (i.e. context) of 
type OPATH, which we recursively transform into a table path equivalent tp; and tar is its target property. 
For each given tar, table access is determined by checking membership against various domains: a 
bidirectional association will be accessed by means of an association table. If the target property is 
sequence-valued (i.e. the case of SCTarget), it cannot be accessed for its entirety, but only one of its 
members through indexing. The function objToTabExpr transforms the index expression oe that contains 
paths of type OPATH to te that contains paths of type TPATH. The function objToTab transforms an object 
model om to an equivalent table model tm. 

SQL database statements express paths via (nested) SELECT queries. For example, the above Table 

path has its SQL statement counterpart: 

SELECT (VAR 'reglist') 

(TABLE 'Hotel_registered_Traveller_reglist ') 
(VAR 'oid' = (SELECT (VAR 'owner') 

(TABLE ' Account_owner_Traveller_account ' ) 

(VAR oid = VAR this))) 

where oid is the default column (declared as the primary key) for each table that implements an associ- 
ation. We can show E3l by structural induction that the transformation from BPATH to OPATH, from 
OPATH to TPATH, and from TPATH to SELECT statements are correct. 

Expression transformation. We transform both predicates and expressions on Table model into SQL 
expressions: 

toSqlExpr : : TABLEJTODEL -> Predicate -> SQL_EXPR 
toSqlExpr' :: TABLE_MODEL -> Expression -> SQL_EXPR 

Some transformations are direct 

toSqlExpr tm (And p q) = toSqlExpr tm p 'AND' toSqlExpr tm q 
whereas others require an equivalent construction: 

toSqlExpr' tm (Card e) | isPathExpr e = SELECT [COUNT (VAR oid)] (toSqlExpr' tm e) TRUE 

5 Assignment Transformation 

The most important aspect of the model transformation is the handling of attribute assignments and 
collection updates. There are 36 distinct cases to consider, given the different combinations of attributes 
and bidirectional (opposite) associations. We present a single, representative case in Table [TJ for an 
association between an optional attribute (multiplicity 0..1) and a sequence- valued attribute (ordered 
with multiplicity *) . 



Bi-Assoc. Decl. 


# 


GSL Substitution 


SQL Queries 


seq-to-opt 

class A class B 
bs: seq(B.ao) ao: [A.bs] 


23 


bs := ins (bs,i,that) 


UPDATE t SET index = index + 1 

WHERE ao = this AND index > i ; 
INSERT INTO t (bs, ao, index) VALUE (that, this, i); 


II 

that.ao := this 



Table 1 : Assignment Transformation Pattern for sequence-to-optional Bi-Association 



From left to right, the columns of the table present declarations of properties, numerical identifiers 
of patterns, their abstract implementation in the substitution program, and their concrete implementation 
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in database queries. The dummy variables this and that are used to denote instances of, respectively, the 
current class and the other end of the association. 

For each case (for each row of the completed table), we define a transformation function toSqlProc 
that turns a substitution into a list of SQL query statements. 

toSqlProc tm _ s@(Assign _ _) = transAssign tm s 
transAssign : : TABLE_M0DEL -> Substitution -> [STATEMENT] 

The function toSqlProc delegates the task of transforming base cases, i.e. assignments, to another 
auxiliary function transAssign that implements the 36 patterns. The recursive cases of toSqlProc are 
straightforward. For example, to implement a guarded substitution, we transform it into an IfThenElse 
pattern that is directly supported in the SQL domain; and to implement iterators (ALL, ANY), we instantiate 
a loop pattern, declared with an explicit variant, that is guaranteed to terminate. 



6 Correctness Proofs 



The correctness of both BooSTER-to-OBJECT and OBJECT-to-TABLE transformations can be estab- 
lished by constructing a relational model mapping identifiers and paths to references and primitive val- 
ues, and then showing that the different reference mechanisms identify the same values in each case. To 
prove the correctness of the TABLE-to-SQL transformation (shown as the vertical, thick arrow in Figure[2] 



on page 104 1, we need also to introduce linking invariants between model states. We first formalise states 



and operations for each model domain. In the Z notation, sets and relations may be described using a 
schema notation, with separate declaration and constraint components and an optional name: 



name 

declaration 

constraint 



Either component may include schema references, with the special reference A denoting two copies of 
another schema, typically denoting before- and after-versions, the attributes of the latter being decorated 
with a prime ('). The remainder of the mathematical notation is that of standard, typed, set theory. 
We map the state Object model into a relational semantics S? bu characterised by: 

I — ^obj 

OBJECT JAODEL 

extent : N -CLASS -^V Objectld 

value : Objectld -+» N -PROPERTY -+>■ Value 

domextent = domclass 

Vc : N -CLASS; o : Objectld | 

c G domextent A o G extent (c) • dom(value (o)) = dom((classc). property) 



The inclusion of OBJECT_MODEL (whose details are omitted here) enables us to constrain the two 
mappings according to the type system of the object model in question. Value denotes a structured type 
that encompasses the possibilities of undefined value (for optional properties), primitive value, and set 
and sequence of values. 

The state of a table model will be composed of: 1) the type system of the object model in context; 
and 2) functions for querying the state of such a context object model. More precisely, 
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. TABLE_MODEL _ 



OBJECT -MODEL 
nTableModel : N -MODEL 

assocTables, setTables : V(N-CLASS x N -PROPERTY) 



where assocTables, setTables and seqTables are reflective queries: for example, assocTables returns 
the set of attributes/properties (and their context classes) that are stored in the association tables. We 
formalise the Table model state as: 

- >s I ab 

^obj 

TABLE-MODEL 



For each instance of S^ bj, there should be a corresponding configuration of TABLE JVIODEL. A SQL 
database corresponds to a set of named tables, each containing a set of column-value mappings: 



tuples : N -TABLE -+>• PTuple 



. Tuple 

values : N -COLUMN -+>• ScalarValue 



We use ScalarValue to denote the collection of basic types: strings, integers, and Booleans. We require 
mapping functions to retrieve values from TABLE and SQL: 

<dt : y ta b x (NClass x NProperty) -+>■ F(Value x Value) 

These return reference-value pairs for each kind of property. For example, set-valued properties are 
returned by 

^ set == X s : ,5^ ta b\ p : NClass x NProperty • 

o : Objectld; v : Value; vs : ¥ Primitive \ 
|J 1 o G s. extent (fstp) A v = s.value (o) (sndp) A v = setValue (vs) • 
{v ! :vs.[[of v ^[[v ! f v } 



The set of mappings for a particular table is given by 

Xs:,y sq f, n:NTable; c\,C2 : NColumn • { row : s. tuples (n) • row. values (c\) ^ row. values (02) } 
and the necessary linking invariant is: 

Table o Sql 

y 11b 



sql 



c 



where C comprises six conjuncts, one for each possible unordered combination of association end mul- 
tiplicities. 

Each operation is implemented as an atomic transaction. M bj represents the formal context, with the 
effect upon the state being described as a binary relation (<H>). 
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— &obj 

input : YN -VARIABLE 

output : VN -VARIABLE 

effect : (y obj x IO obj ) o (y ohj x IO obj ) 

effect € ^obj x {input — > Value) O ^obj x {output — > Value) 



Each element of /0 f,, == N -VARIABLE ■+> Va/we represents a collection of inputs and outputs. 

Using o5^ /y and ^ y, we may write System ob j to denote the set of possible object system configura- 
tions, each characterised through its current state (of type and its set of indexed operations (of type 
M bj). More precisely, 

System ob j 

state : J^obj 

relation : N -CLASS -+> N -OPERATION -+» 



We will describe the effect of a primitive assignment (:=), and use this as the basis for a recursive 
definition of effect, based on the grammar of the GSL notation. If Assignlnput is the schema {path? : 
PATH; el : Expression], then we may define 

AssignEffect 

S , S . '^obj 

Assignlnput 

s.nObjModel = s' .nObjModel 

s.sets — s' .sets 

s. classes = s' .classes 

s. extent = s' .extent 

let p == target [[paf/i?]] ; o == context [[paf/i?]] • s' .value = s. value © {o t— > 

s. value (o) © {p i— > eva/(e?)}} 



The input /?a?/2? can be either OPATH and TPATH: for the former, the other input expression el involves 
paths, if any, of type OPATH; for the latter, it is TPATH. The (let es • p) expression, where es consists of a 
list of expression-to-variable bindings, denotes a predicate p on the variables of es. 

We start by relating domains of the Object model and Table model, where assignment paths are 
specified in, respectively, OPATH and TPATH (Fig [3]). In the Object model domain, an assignment is 
parameterised by a path of type BPATH and an expression that consists of paths, if any, of the consistent 
type. We formalise each Object model assignment under the formal context of & bj, by defining its 
effect mapping though the constraint of AssignEffect and by requiring that the sets of external inputs and 
outputs are empty. 

Assign obj 

Kbj 

op? : OPATH 
oel : Expression 

Vi,i' : S^obf, Assignlnput \ path! = OPath {op?) A e? = oe? • AssignEffect <^ {s, {}) i~~> {s' , {}) G effect 



The characterisation Assign tab of an assignment in the Table model domain is similar to that of 
Assign [,j, except that the target is now of type TPATH, and the source is now of type Expression. We may 
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then map our extended GSL substitution into a relation: 

i-\,bj '■ Substitution -> ((y obj x IO obj ) o (y obj x IO ohj )) 

of the same type as the effect component of & ob j. Given a Table path tpl and an expression tel, we 
represent the assignment substitution tpl '■= tel by the effect relation of Assign tab that exists uniquely 
with respect to tpl and tel. More precisely, 

[[tpl := tel\ )hi = (pAssign tab ). effect 

where [iAssign tab denotes the unique instance of y tab such that the constraint as specified in Assign tab 
holds, and .effect retrieves its relational effect on the model state. The definition of Assign tab is very 
similar to that of Assign ob j, except that the input path is constrained as pathl = TPath (...). 

We interpret a guarded substitution g — > S as a relation that has the same effect as [5]] ^- within the 
domain of satisfying states of guard g (denoted as [[g]]'^"); otherwise, it just behaves like skip as it will 
be blocked and cannot achieve anything. More precisely, we have: 

U s]] obj = id(y obj x io obj ) © ( < l s U) 

Similar rules may be defined for other combinators. 

Each transaction is composed of SQL queries, and similar to M ob j, we collect and produce, respec- 
tively, its list of inputs and outputs upon its initiation and completion. We use & sq i to denote such formal 
context, under which the transformational effect on the state of database is defined accordingly as a 
function, reflecting the fact that the database implementation is deterministic in its effect. 

I — &sql 

input, output : PNJVARIABLE 

effect : (y sql x IO sql ) -> {y sql x IO sql ) 

this G input 



The mechanism of referencing the current object (via this) is simulated through providing by default the 
value of this for each generated stored procedure or function. We model inputs and outputs in the same 
way as we do for IO ob j, except that the range of values is now of type ScalarValue. 

For each S QL statement, we assign to it a relational semantics by mapping it to a relation on states 
(of type y sq i). This is a similar process to that for [[-]] OJ y. More precisely, we define: 

[[_]],^ : Statement -»• ((y sql x IO sqt ) o (y sqt x IO sq i)) 

And since a SQL stored procedure is defined as a sequential composition, we also define 

I-Eseq*?/ : ^Statement ->• ((y sq i x IO sq i) o {y sql x 10 sql )) 

to derive its effect through combining those of its component statements via relational composition. For 
primitive query statements, we refer to their schema definitions. For example, we have: 

[[UPDATE t SET sets WHERE cond\ ql = {pi UPDATE) .effect 

where the state effect of query (UPDATE tablel SET setsl WHERE condl) is formally specified in a schema 
named UPDATE. The UPDATE query modifies in a table those tuples that satisfy a condition and takes 
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as inputs table? a table name, sets? a mapping that specifies how relevant columns should be modified, 
and condl a Boolean condition that chooses the range of tuples to be modified. The schema UPDATE is 
defined similarly as is Assign ta b, except that it imposes constraints on the model state S? sq i. We formalise 
an IF . . . THEN ... ELSE .. . statement as the union of the semantic interpretations of the two sequences of 
statements in its body, each suitably restricted on its domain. 

[[IF b THEN stmts, ELSE stmts 2 ]] sql = ([[b))f^ < [[stmts, }} seqsql ) U ([[NOTfcJ^ < [[« 2 ]] seqs<?; ) 

where [&]^ eJ denotes the set of satisfying state of a SQL expression b. 

To define the semantics of a WHILE loop, we intend for the following equation to hold 

[[WHILE b DO stmts END WHILE] j = [IF £ THEN stmts ^ (WHILE b DO stmts END WHILE) ELSE () }} s(/l 

where ^ is the operator for sequence concatenation. By applying the definition of [[-]] 5?; on 
IF . . . THEN . . . ELSE ... and [[-]] seq s<// on () , we have 

[[WHILE b DO stmts END WHILE J j ) 

U 

lmTbf™ es <id(y sql xio sql ) 

Let us define a function 

When X = [[WHILE b DO stmts END WHILE] z , we obtain 

[[WHILE b DO stmts END WHILE] j = F ( [[WHILE b DO stmts END WHILE] z ) 

which means that [[WHILE b DO stmts END WHILE]] t should be a fixed-point of function F. The least fixed- 
point (LFP) of function F — i.e. \J ne ^F n (0) — exists by Kleene's fixed-point theorem, since F is easily 
provable to be continuous. We choose this LFP of F for the value of [[WHILE b DO stmts END WHILE] 

We are now able to establish the correctness of the transformation with respect to the linking in- 
variant. The commuting diagram of Figure [4] shows how a substitution program prog and its context 
Table model (i.e. QTableModel), are mapped by the transformation toSqlProc (6 TableModel) {prog) to 
produce an SQL implementation. The linking invariant holds for the before states Table -r- SQL and 
for the after states Table f-» Sql'. We then establish that for each state transformation, characterised by 
the relational effect of the generated SQL code from prog, there is at least a corresponding state transfor- 
mation, characterised by the relational effect of the TABLE program, ^prog^ ob j. This is an example of 
simulation between abstract data types lfT9l . 

We use a universal quantification (Vjc | R(x) • P(x)) to state our correctness criterion: the x part 
declares variables, the R(x) part constrains the range of state values, and the P(x) part states our concern. 
Schemas defined above (i.e. J? sq i, and Table «-> SQL) are used as both declarations and predicates. 
If we declare 

Translnput 

TableModel 
prog : Substitution 



[WHILE b DO stmts END WHILE" 



sql 



D states 
sql 



<\ ( [[sfmta]] 



s&qsql 9 
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Object Pre- state 



[[prog} ohj 



Object Post-state 




Table <h> Sql 



(Table <h- Sql)' 



Sql Pre-state 



Sql Post-state 



[[toSqlProc {d TableModel) (prog) ]] 



9" 



sql 



Figure 4: Correctness of Model Transformation 



to represent the inputs to the transformation, then 

yTransInput; AJ?^; Ae5^ ? / | 

Table Sql A [[toSqlPmc(e TableModel) (prog)} seqsql • ( B^bj • (Table o Sql)' A [[/?rog]] oi; . ) 

With the relational semantics outlined above, we may establish this result through a combination of case 
analysis and structural induction. 

7 Example Implementation 

Consider the implementation, on a relational database platform, of the operation reserve introduced in 
Section [2] Having translated the object model into a collection of database tables, the generation pro- 
cess will produce a stored procedure for each operation. The guard for reserve requires that the current 
number of allocations — characterised through the cardinality of the set-valued attribute allocations — is 
below a specific bound. We might include such a condition, for example, to ensure that the memory or 
storage requirements of the system remain within particular bounds; this may not be an issue for a hotel 
reservation system, but is a realistic concern in critical systems development. In the implementation, a 
stored function is generated that will establish whether or not the guard constraint holds for the current 
state, together with any input values. The remainder of the generated code will achieve the effect speci- 
fied in the original operation constraint, translated into the representation, or orientation, of the database 
platform. 

Class Reservation has status as an attribute, and this is stored in the corresponding class table. In 
the function, AUTO_INCREMENT allows the target SQL platform to generate a unique identifier for each 
inserted row. Set-valued properties, like attribute dates in class Reservation are stored in separate 
tables, with an oid column to identify the current object in a given method call. Associations such as 
host and reservations are stored in separate tables, with an oid column to identify the exact association 
instance. Since attribute reservations are also sequence-valued, an index column is required. 



1 CREATE TABLE 'Reservation' (' oid' INTEGER AUTO_INCREMENT , PRIMARY KEY ('oid'), 'status' CHAR(30)); 

2 CREATE TABLE 'Room_reservations_Reservation_room' ( ' oid' INTEGER AUTO_INCREMENT , 



Schema of Tables Updated by 'reserve' 



PRIMARY KEY ('oid'), 'reservations' INTEGER, 'room' INTEGER, 'index' INTEGER); 
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We generate also integrity constraints for association tables: although the generated procedures are guar- 
anteed to preserve semantic integrity, this affords some protection against errors in the design of addi- 
tional, manually-written procedures. 

The value of the model-driven approach should be apparent following a comparison of the original 
specification for reserve with the fragments of the following SQL implementation. Manual produc- 
tion of queries that need to be take account of a large number of complex model constraints — as well 
as, for examples, constraints arising from data caching strategies — is time-consuming and error-prone. 
Furthermore, we may expect the design of a system to evolve during use: the challenge of maintaining 
correctness in the face of changing specifications (and platforms) adds another dimension of complexity 
to systems development; some degree of automation, in production and in proof, is essential. 

In the following, variable names have been preserved from the BOOSTER domain, e.g. the input and 
output parameters dates? and r! at line 2, as well as caching variables 'r! .status', 'r! .host', and 
'r! .room' at line 4. Meta- variables are used to implement the ALL iterator in method reserve: Line 
5 declares, respectively, the bound variable 'x' and 'x_variant' the variant of the loop, and Line 6 
declares a cursor over the set-valued input dates?. 

Queries Implementing 'reserve': Declarations 

CREATE PROCEDURE 'Hotel_reserve' (IN 'this?' INTEGER, 

IN 'dates?' CHAR (30) , IN 'm?' INTEGER, OUT 'r! ' INTEGER) 
BEGIN 

DECLARE 'r!. status' CHAR(30) ; DECLARE 'rl.host' INTEGER; DECLARE 'r! .room' INTEGER; 
DECLARE 'x' Date; DECLARE 'x_variant' INTEGER; 

DECLARE 'x_cursor' CURSOR FOR (SELECT * FROM 'dates?' WHERE TRUE); 



Line 7 first creates a new instance of Reservation by inserting, for output r !, a row formatted as 
(oid, ...) into the appropriate class table, where oid is a unique value generated by the built-in function 
last_insert_id(), with the guarantee that each subsequent call to this functions returns a new value. It 
then assigns this unique identifier to r ! for queries in later fragments to refer to. 

Queries Implementing 'reserve': Creating an Empty Output 

INSERT INTO 'Reservation' () VALUE () ; SET 'r! ' = last_insert_id () ; 



In Lines 8 to 10 the pair of DROP TEMPORARY TABLE and CREATE TEMPORARY TABLE queries update the 
value of a cache variable 'm? .reservations ' that denotes a multi- valued property: this kind of caching 
is useful in large database implementations. In Line 11 we update the caching variable 'r! .host' of 
single- valued types of properties through a SELECT INTO query. We cache the value of attribute host 
possessed by the reservation r ! . Any later paths with ' r ! . host ' or ' m? . reservations ' as its prefix will 
be able to use its value directly without re-evaluation. 

Queries Implementing 'reserve': Updating Caching Vars 

DROP TEMPORARY TABLE IF EXISTS 'm? . reservations ' ; 
CREATE TEMPORARY TABLE 'm? .reservations ' AS 

SELECT 'reservations' FROM ' Room_reservations_Reservation_room' WHERE 'room' = 'm?'; 
SELECT 'status' INTO 'r! . status' FROM 'Reservation' WHERE 'oid' = 'r! ' ; 



Lines 12 to 20 instantiate a finite loop pattern. In Line 12 we activate the declared cursor and and 
fetch its first available value. In Line 13 we also calculate the size of the data set that the cursor will 
iterate over and use it as the variant of the loop defined in Lines 14 to 20. The exit condition (Line 14) is 
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characterised through decreasing — via the 2nd statement in Line 19 — the value of x_cursor; the bound 
variable x is updated to the next data item at the end of each iteration (via the 1st statement in Line 19). 
In each iteration of the loop, from Lines 15 to 17 we re-cache the value of the set-valued path r ! . dates, 
in case there are other paths which contain it as a prefix and are used later in the loop. In Line 18 
we perform the first substitution in the specification of method reserve: we implement the substitution 
r!. dates := r!. dates \/ dates? via iterating through the input dates? with a bound variable 'x'. 



Queries Implementing 'reserve': Terminating Loop 

12 OPEN ' x_cursor ' ; FETCH ' x_cursor ' INTO ' x ' ; 

13 SELECT COUNT (*) INTO 'x_variant' FROM 'dates?' WHERE TRUE; 

14 WHILE ('x_variant') > (8) DO 

is DROP TEMPORARY TABLE IF EXISTS 'r!. dates'; 

16 CREATE TEMPORARY TABLE 'r!. dates' AS 

n SELECT 'dates' FROM ' Reservation_dates ' WHERE 'oid' = 'r! ' ; 

is INSERT INTO 'Reservation_dates' ('oid', 'dates') VALUE ('r! ' , 'x'); 

w FETCH 'x_cursor' INTO 'x'; SET 'x_variant' = 'x_variant' - 1; 

20 END WHILE ; CLOSE ' x_cursor ' ; 



Line 21 implements the update r ! . status := unconfirmed. The two generated query statements — 
that are located in Lines 22 to 27 and Lines 28 to 31 — implement the last two parallel assignments in 
reserve that update the optional-to-sequence association. They correspond exactly to the rules specified 
for pattern 23 in Section[5] The queries for the middle two parallel assignments in reserve, updating the 
one-to-sequence association, are entirely similar. 



Queries Implementing 'reserve': Performing Updates 

21 UPDATE 'Reservation' SET 'status' = 'unconfirmed' WHERE ('oid') = ('r!'); 

22 UPDATE 'Room_reservations_Reservation_room' 

23 SET 'index' = ('index') + (1) 

24 WHERE 'room' = 'm?' AND 

25 ' index '>= (SELECT COUNT ('oid') 

26 FROM (SELECT 'reservations' FROM 'm? . reservations ' WHERE TRUE) AS reservations 

27 WHERE TRUE) + 1; 

28 INSERT INTO 'Room_reservations_Reservation_room' ('reservations', 'room', 'index') VALUE 

29 ('r!','m?, (SELECT COUNT ('oid') 

30 FROM (SELECT 'reservations' FROM 'm? . reservations ' WHERE TRUE) AS reservations 

31 WHERE TRUE) + 1) ; 



8 Discussion 

The principal contribution of this paper is the presentation of a practical, formal, model-driven approach 
to the development of critical systems. Both the modelling notation and the target programming language 
are given a formal, relational semantics: the latter only for a specific subset of the language, sufficient 
for the patterns of implementation produced by the code generation process. The generation process is 
formalised as a functional program, easily related to the corresponding transformation on the relational 
semantics. It is perfectly possible to prove the generator correct; indeed, a degree of automatic proof 
could be applied here. The task of system verification is then reduced to the strictly simpler task of 
model verification or validation. 

The implementation platform chosen to demonstrate the approach is a standard means of storing data, 
whether that data was originally described in a hierarchical, a relational, or an object-oriented schema. In 
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particular, there are many products that offer a means of mapping EOl from object models (as used here) 
to a relational database implementation: Hibernate Q is perhaps the best-known example. However, 
translating the data model to a data schema is relatively straightforward; the focus here is the generation 
of correct implementations for operations. 

At the same time, much of the work on program transformation is focussed, unsurprisingly, upon 
code rewriting rather than the generation of complete software components with persistent data. The 
work on Vu-X lfT6ll . where modifications to a web interface are reflected back into the data model is 
an interesting exception, but has yet to be extended to a formal treatment of data integrity. The work 
on UnQL [ 12] supports the systematic development of model transformation through the composition 
of graph-based transformations: this is a powerful approach, but again no similar framework has been 
proposed. 

Some work has been done in precise data modelling in UML, for example [7 ], but no formal account 
has been given for the proposed translation of operations. The Query/View/Transformation approach ifTTl 
focuses on design models, but the transformations |[T3l are described in an imperative, stateful, style, 
making proofs of correctness rather more difficult. Recent work on generating provably correct code, 
for example E2l . is restricted to producing primitive getter and setter methods, as opposed to complex 
procedures. Mammar lfl4l adopts a formal approach to generating relational databases from UML mod- 
els. However, this requires the manual definition of appropriate guards for predefined update methods: 
the automatic derivation of guards, and the automatic generation of methods from arbitrary constraint 
specifications, as demonstrated here, is not supported. 

The unified implementation and semantic framework for transformation (Figure [2]) presented here 
can be applied to any modelling and programming notation that admits such a relational semantics for 
the behaviour of components. It is important to note that the style of this semantics effectively limits the 
approach to the development of sequential data components: that is, components in which interactions 
with important data are managed as exclusive transactions; our semantic treatment does not allow us to 
consider the effects of two or more complex update operations executing concurrently. 

In practice, this is not a significant limitation. Where data is encapsulated within a component, and 
is subject to complex business rules and integrity constraints, we may expect to find locking or caching 
protocols to enforce data consistency in the face of concurrent requests, by means of an appropriate 
sequentialisation. Where concurrency properties are important, they can be addressed using process 
semantics and model-checking techniques; a degree of automatic generation may even be possible, al- 
though this is likely to be at the level of workflows, rather than data-intensive programs. 

Work is continuing on the development of the transformation and generation tools discussed here, 
with a particular emphasis upon the incremental development of operation specifications and models. It 
is most often the case that a precise model will prove too restrictive: when a property is written linking 
two or more attributes, it constrains their interpretation; if one of these attributes is used also elsewhere in 
the model, or within an operation, then that usage may not always be consistent with the now formalised 
interpretation. In our approach, such a problem manifests itself in the unavailability of one or more 
operations, in particularly circumstances. 

As a guard is generated for each operation, sufficient to protect any data already acquired, each 
incremental version of the system can be deployed without risk of data loss. It can then be used in practice 
and in earnest, allowing users to determine whether or not the availability — or the overall design — of 
each operation and data view matches their requirements and expectations. Where an operation has 
a non-trivial guard, additional analysis may be required to demonstrate that the resulting availability 
matches requirements: in many cases, the necessary check or test can be automated. The work described 
here provides a sound foundation for this development process. 
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